Learning to Create Customized Authority Lists

نویسندگان

  • Huan Chang
  • David Cohn
  • Andrew McCallum
چکیده

The proliferation of hypertext and the pop ularity of Kleinberg s HITS algorithm have brought about an increased interest in link analysis While HITS and its older relatives from the Bibliometrics provide a method for nding authoritative sources on a particular topic they do not allow individual users to inject their own opinions on what sources are authoritative This paper presents a tech nique for learning a user s internal model of authority We present experimental results based on Cora on line index a database of approximately one million on line computer science literature references Introduction Bibliometrics White McCain Small involves studying the structure that emerges from sets of linked documents Traditionally these links have taken the form of citations among journal articles al though Kleinberg and others e g Brin Page have found that they adapt well to sets of hyper linked documents Bibliometric techniques exist for identifying subject areas research specialties and in uential contributions to a corpus by examining cor relations in the references or links between documents in the corpus Kleinberg s algorithm Hypertext Induced Topic Se lection HITS calculates principal eigenvector of the document link matrix and ranks the authority of a document by the magnitude of its projection onto these eigenvectors A de ciency of this approach is that while the most popular and thus most heavily linked documents may be authoritative in many senses they may not correspond to a particular user s internal model of authority A user may have the opinion for example that a particular document X is truly an authority or that anything written by author Y is worthless What this user would like is a measure of document authority that coincides with their own preconceived notions of what is important In this paper we present a technique that learns from a small amount of user feedback to realign the eigen vectors of the link matrix e ectively re calibrating the measure of authority to correspond more closely to the user s internal model Our technique is similar to rel evance feedback van Rijsbergen but instead of manipulating a query to learn the relevance of the retrieved documents we manipulate the weighting of the link matrix to learn their authority We demonstrate our algorithm on the problem of iden tifying subjectively authoritative computer science re search papers We rst demonstrate how it can lift the authority of research papers in a particular sub discipline and then show how it can be used to auto matically create user speci c top ten lists The Authority of a Document In this section we rst describe how the authority of a document is computed with respect to a corpus following the conventions of the HITS algorithm We then describe the application of this method to a doc ument corpus and discuss some of the limitations we have observed with its use The following section de scribes our lifting algorithm which addresses some of these limitations Hypertext induced topic selection is performed as fol lows Given a set of documents a link matrixM spec i es their connectivity element Mij is non zero if and only if there is a reference or link from document i to document j TypicallyMij is set to or in some varia tions over the total number of references the document makes For a given document i in the set let ai and hi be the authority and hub scores respectively These scores are real numbers greater than or equal to zero and have the following interpretation a large hub score means the document points to many good authorities a large authority score means it is pointed to by many good hubs This recursive de nition leads to a set of linear equations ai X

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Effective Learning to Rank Persian Web Content

Persian language is one of the most widely used languages in the Web environment. Hence, the Persian Web includes invaluable information that is required to be retrieved effectively. Similar to other languages, ranking algorithms for the Persian Web content, deal with different challenges, such as applicability issues in real-world situations as well as the lack of user modeling. CF-Rank, as a ...

متن کامل

Validation of the French-language version of the OTOSPEECH automated scoring software package for speech audiometry.

OBJECTIVES To validate a novel speech audiometry method using customized self-voice recorded word lists with automated scoring. PATIENTS AND METHODS The self-voice effect was investigated by comparing results with prerecorded or self-recorded CVC (consonant-vowel-consonant) word lists. Then customized lists of 3-phoneme words were drawn up using the OTOSPEECH software package, and their score...

متن کامل

Cycle Time Optimization of Processes Using an Entropy-Based Learning for Task Allocation

Cycle time optimization could be one of the great challenges in business process management. Although there is much research on this subject, task similarities have been paid little attention. In this paper, a new approach is proposed to optimize cycle time by minimizing entropy of work lists in resource allocation while keeping workloads balanced. The idea of the entropy of work lists comes fr...

متن کامل

Authority and convergence in collaborative learning

Teachers and students have established social roles, norms and conventions when they encounter Computer-Supported Collaborative Learning (CSCL) systems in the classroom. Authority, a major force in the classroom, gives certain people, objects, representations or ideas the power to affect thought and behavior and influences communication and interaction. Effective computer-supported collaborativ...

متن کامل

Student-Sensitive Multimodal Explanation Generation for 3D Learning Environments

Intelligent multimedia systems hold great promise for knowledge-based learning environments. BeCallSe f)f re(:eltt a(lvan(:es in OllF Iln(lerstaIl(ling how to dynamically generate multimodal explanations and the rapid growth in the performance of 3[) graphics technologies, it is becoming feasible to create multimodal explanation generators that operate in re~ltime. Perhaps most (:ompelling abou...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000